Search CORE

969 research outputs found

The Long-Short Story of Movie Description

Author: A Farhadi
A Kojima
A Rohrbach
Anna Rohrbach
C Fellbaum
P Young
S Hochreiter
Publication venue
Publication date: 01/01/2015
Field of study

Generating descriptions for videos has many applications including assisting blind people and human-robot interaction. The recent advances in image captioning as well as the release of large-scale movie description datasets such as MPII Movie Description allow to study this task in more depth. Many of the proposed methods for image captioning rely on pre-trained object classifier CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating descriptions. While image description focuses on objects, we argue that it is important to distinguish verbs, objects, and places in the challenging setting of movie description. In this work we show how to learn robust visual classifiers from the weak annotations of the sentence descriptions. Based on these visual classifiers we learn how to generate a description using an LSTM. We explore different design choices to build and train the LSTM and achieve the best performance to date on the challenging MPII-MD dataset. We compare and analyze our approach and prior work along various dimensions to better understand the key challenges of the movie description task

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Much Ado About Time: Exhaustive Annotation of Temporal Data

Author: Farhadi Ali
Gupta Abhinav
Laptev Ivan
Russakovsky Olga
Sigurdsson Gunnar A.
Publication venue
Publication date: 01/01/2016
Field of study

Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to perceive. In contrast, we investigate and determine the most cost-effective way of obtaining high-quality multi-label annotations for temporal data such as videos. Watching even a short 30-second video clip requires a significant time investment from a crowd worker; thus, requesting multiple annotations following a single viewing is an important cost-saving strategy. But how many questions should we ask per video? We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments). We demonstrate that while workers may not correctly answer all questions, the cost-benefit analysis nevertheless favors consensus from multiple such cheap-yet-imperfect iterations over more complex alternatives. When compared with a one-question-per-video baseline, our method is able to achieve a 10% improvement in recall 76.7% ours versus 66.7% baseline) at comparable precision (83.8% ours versus 83.0% baseline) in about half the annotation time (3.8 minutes ours compared to 7.1 minutes baseline). We demonstrate the effectiveness of our method by collecting multi-label annotations of 157 human activities on 1,815 videos.Comment: HCOMP 2016 Camera Read

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Association for the Advancement of Artificial Intelligence: AAAI Publications

Move Forward and Tell: A Progressive Generator of Video Descriptions

Author: A Farhadi
A Geiger
A Rohrbach
A Rohrbach
G Kulkarni
J Steinberger
L Wang
M Regneri
Publication venue
Publication date: 26/07/2018
Field of study

We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -- what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201

arXiv.org e-Print Archive

Crossref

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Author: Farhadi Ali
Gupta Abhinav
Laptev Ivan
Sigurdsson Gunnar A.
Varol Gül
Wang Xiaolong
Publication venue
Publication date: 26/07/2016
Field of study

Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Evaluation of soil-tire interaction on a soil bin

Author: Alimardani R.
Farhadi Payam
Mohsenimanesh A.
Publication venue: International Commission of Agricultural and Biosystems Engineering
Publication date: 24/01/2013
Field of study

A single wheel tester with the attention to the size of soil bin has been designed and fabricated to study soil tire interactions, in controlled soil environment. The main parts of a single wheel tester include chassis, reduction gear unit, three-phase AC electric motor, hydraulic cylinder, tank, pump and valve, load cell and tires. The experiment was designed with two levels of tire axle loads (15 and 25 kN) and two inflation pressures (70 and 150 kPa). The tire (18.4/15-30) was run at a constant forward speed of 0.3 m s-1, 13% slip and 12% moisture content(d.b.) on clay loam soil. A statistical comparison was made for the cone index values measured in the undisturbed soil, at the center of the track, and at the edge of the track. A significant difference in cone index was found for all treatments. Inflation pressure at the center and load at the edge of tire track has significant effect on cone index and dry bulk density. Keywords: cone index, inflation pressure, load; dry bulk density, soil bi

Agricultural Engineering International (E-Journal, CIGR - International Commission of Agricultural Engineering)

The effects of alfalfa particle size and acid treated protein on ruminal chemical composition, liquid, particulate, escapable and non escapable phases in Zel sheep

Author: Farhadi A
Golchin-Gelehdooni S
Teimouri-Yansari A
Publication venue: 'African Journals Online (AJOL)'
Publication date: 12/11/2013
Field of study

This study was conducted to investigate the effects of alfalfa particle size (long vs. fine) and canola meal treated with hydrochloric acid solution (untreated vs treated) on ruminal chemical composition, liquid, particulate, escapable and non escapable phases in Zel sheep. Four ruminally cannulated sheep received a mixed diet (% of dry matter) consisting of 23.73 alfalfa, 8.70 canola meal, 39.56 wheat straw, 13.45 beet pulp and 13.45 barley grain and 1 mineral-vitamin mixture. The experimental design was a 4 × 4 Latin square with 22-days periods. The diet was offered twice daily (09:00 and 21:00 h). The rumens were evacuated manually at 3, 7.5 and 12 h post-feeding and total ruminal contents were separated into mat and liquids. Dry matter weight distribution of total recovered particles was determined by a wetsieving procedure and used to partition ruminal mat and liquids among percentage of large (≥ 6.35 mm), medium (< 6.35 and ≥ 1.18 mm), and small (< 1.18 and ≥ 0.5 mm) particles. Lyophilized ruminal digesta were analyzed for chemical composition especially for CP, NDF and EE. No interactions (P > 0.05) between dietary particle size and acid level were observed for ruminal chemical composition, liquid, particulate, escapable and non escapable phase. Treatment of canola meal and increase of particle size reduced the values of CP. Generally, with increase in time after feeding, the values of each nutrient decreased. Particle size and time post-feeding had a pronounced effect on the distribution of different particle fractions, whereas acid level did not influence it. With increase in time after feeding, percentage of particles ≥ 6.35 mm decreased, whereas the percentage of particles < 6.35 mm increased, illustrating intensive particle breakdown in the reticulo-rumen. Different particle size and time post-feeding had pronounced effect on total mass of ruminal digesta, ruminal mat and liquid part, in which fine particles and 12 h post feeding caused the lowest rumen mat. Time post feeding and acid level did not influence the values of pH significantly, whereas with increase in particle size, the values of pH increased.Key words: Canola meal, particle size, rumen mat, escapable, non escapable phase

AJOL - African Journals Online

Risk-Based Capacitor Placement in Distribution Networks

Author: Elyasi E.
Elyasi E.
Estebsari A.
Estebsari A.
Falaghi H.
Falaghi H.
Farhadi M.
Farhadi M.
Ramezani M.
Ramezani M.
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

In this paper, the problem of sizing and placement of constant and switching capacitors in electrical distribution systems is modelled considering the load uncertainty. This model is formu- lated as a multicriteria mathematical problem. The risk of voltage violation is calculated, and the stability index is modelled using fuzzy logic and fuzzy equations. The instability risk is introduced as the deviation of our fuzzy-based stability index with respect to the stability margin. The capacitor placement objectives in our paper include: (i) minimizing investment and installation costs as well as loss cost; (ii) reducing the risk of voltage violation; and (iii) reducing the instability risk. The proposed mathematical model is solved using a multi-objective version of a genetic algorithm. The model is implemented on a distribution network, and the results of the experiment are discussed. The impacts of constant and switching capacitors are assessed separately and concurrently. Moreo- ver, the impact of uncertainty on the multi-objectives is determined based on a sensitivity analysis. It is demonstrated that the more the uncertainty is, the higher the system cost, the voltage risk and the instability risk are

LSBU Research Open

Insulin-like growth factor I gene polymorphism associated with growth traits in beluga (Huso huso) fish

Author: Farhadi A.
Rahimi-Mianji Gh.
Yeganeh S.
Zeinaddini Meymand Z.
Publication venue
Publication date: 01/01/2017
Field of study

The aim of the present study was to detect polymorphism in Insulin like growth factor-I (IGF-I) gene of beluga (Huso huso) fish using PCR-SSCP technique and also investigation of its association with growth traits (condition factor, body length and weight). A total of 150 specimens of beluga were randomly selected and DNA was isolated from caudal fin using modified salting out method. Then two fragments of 171 and 362 bp from 5'-UTR and 3'-UTR regions of IGF-I gene were amplified, respectively. Genotyping of individuals by SSCP technique showed five banding patterns of A, B, C, D and E for 5'-UTR region with the frequencies of 29.2, 0.76, 16.92, 51.53 and 10% respectively in one year-old and three banding patterns of A, C and D with the frequency of 45, 10 and 45% for two year-old fish. Also three banding patterns (A, B and C) were seen for 3'-UTR region with the frequency of 62.3, 27.69 and 10.76% in one-year-old and 20, 60 and 20% in two year-old fish. The A banding pattern in 3'-UTR and D banding pattern in 5'-UTR sites were the most frequent pattern in the studied beluga population. The association analysis using SAS statistical software indicated no significant association between observed banding patterns and growth traits (body length, weight, and condition factor) in beluga. Considering the important role of IGF-I as a probable candidate gene affecting growth related traits, these marker sites should be studied more in larger sample sizes and also in other regions of the gene

Aquatic Commons